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Abstract 

Rank/Select dictionaries are data structures for an ordered set 5' C {0, 1, . . . , n — 1} to com- 
pute rank(x, S") (the number of elements in 5* which are no greater than x), and select(i, iS) 
(the i-th smahest element in S) , which are the fundamental components of succinct data struc- 
tures of strings, trees, graphs, etc. In those data structures, however, only asymptotic behavior 
has been considered and their performance for real data is not satisfactory. In this paper, we 
propose novel four Rank/Select dictionaries, esp, recrank, vcode and sdarray, each of which 
is small if the number of elements in S is small, and indeed close to nHo{S) {Ho{S) < 1 is the 
zero-th order empirical entropy of S) in practice, and its query time is superior to the previous 
ones. Experimental results reveal the characteristics of our data structures and also show that 
these data structures are superior to existing implementations in both size and query time. 

1 Introduction 

Rank/Select dictionaries are data structures for an ordered set C {0, 1, . . . , n — 1} to support the 
following queries: 

• rank(x, S): the number of elements in S which are no greater than x, 

• select (i, 5): the position of i-th smallest element in S. 

These data structures are used in succinct representations of several data structures. A succinct 
representation is a method to represent an object from an universe with cardinality L by (1 + 
o(l))lgL bits^ While this idea is very similar to the idea of data compression, the difference is 
that succinct representations support fast queries on the object such as enumerations or navigations. 
Various succinct representation techniques have been developed to represent data structures such 
as ordered sets jlSl d HOI HI], ordinal trees [H UHl El H CSl IH 121, strings H El UHl El HSl 
functions ^7|, and labeled trees Ej. All these data structures are based on a succinct 
representation of Rank/Select dictionaries. 

Many data structures have been proposed for Rank/Select dictionaries, most of which support 
the queries in constant time on word RAM 71^1^11^1^ using n + o{n) bits or nHQ{S) + o(n) 
bits {Hq{S) < 1 is the zero-th order empirical entropy of S). In most of these data structures, 
however, their asymptotic behavior is only considered, and their performance is not optimal for 
real-size data. As a result, the query time is slow and the data structure size is large for real data. 
Although recently some practical implementation of Rank/Select dictionaries have been proposed 
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using n + o(n) bits |H1^5, there is no practical implementation of those using uHq^S) + o{n) bits. 
Recently gap-based compressed dictionaries have been proposed ^D^]. They use another measure 
called gap{S) := "^^^i „ [Ig (select(z + 1,5") — select(z, 5))] to define the minimum space to store 
S and propose the data structure using gap + 0{m log(n/m)/ log m) + 0(nloglogm/n) bits, which 
is much smaller than the entropy-based ones if m <C n, but it cannot not support constant time 
rank and select queries because of the lower bound j^[7j. 

We will introduce novel four Rank/Select dictionaries, esp, recrank, vcode and sdarray(sarray 
and darray), each of which is based on different ideas and thus has different advantage and disad- 
vantage in terms of speed, size and simpleness. These sizes are small if the number of elements in 
S is small, and even close to the zero-th order empirical entropy of S, Hq(S) < 1, which is defined 
as nHQ{S) = m Ig ^ + (n — m) Ig where m is the number of elements in S. 

Table^summarizes the properties of proposed data structures for an ordered set 5 C {0, 1, ... , n— 
1} with m elements in terms of size, time for rank and select. We note that these bounds are 
in the worst case and we can expect faster in practice. For example, the 0(log^ m/ log n) term in 
sarray and darray and O(logn) term in vcode are 0(1) in almost the case. 



Table 1: The space and time results for esp, recrank, vcode, sarray and darray for an ordered 
set S C {0, 1, . . . , n — 1} with m elements. Hq{S) < 1 is the zero-th order empirical entropy of S. 
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We conducted experiments using proposed methods and previous methods and show that our 
data structures are fast and small compared to the previous ones. 

2 Preliminaries 

In this paper we assume the word RAM model. Under the word RAM model we can perform 
logical and arithmetic operations for two 0(logn)-bit integers in constant time, and we can also 
read/write consecutive O(logn) bits of memory for any address in constant time. 

An ordered set S, which is a subset of the universe U = {0, 1, . . . ,n — 1}, can be represented 
by a bit- vector B[0, . . . ,n — 1] such that B[i] = 1 i £ S and B[i] =0 otherwise. We denote m as 
the number of ones in B. Then rank(x, S) is the number of ones in B[0, x], and select(i, S) is the 
position of the i-th one from the left in B. These values are computed in constant time on word 
RAM using 0(n log log n/ log n)-bit auxiliary data structures 16^. 

The above representation of S using the bit vector of length n-bit is the worst-case optimal 
because there exist 2" different sets in the universe and we need lg2" = n bits to distinguish 
different subsets. We call this representation verbatim representation. Similarly, a lower-bound 
of the size of the representation of S with m elements is B{n,m) = [Ig (^)] bits. This value is 
approximately nHQ{B), which is further approximated by Hq{B) < mlg ^ + 1.44m bits . Therefore 
the size of the verbatim representation is far from this lower-bound if m <C n. Raman et al. |21j 
proposed a constant-time Rank/Select data structure whose size is B{n,m) + 0(nloglogn/logn), 
which matches the above lower-bound asymptotically. 
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The applications of Rank/Select dictionaries can be divided into two groups. One is for sets 
with m ~ n/2 and the other is for sets with m <^ n. In this paper we cah the former dense sets and 
the latter sparse sets. Typical applications of dense sets are for the wavelet trees |i9j that are used 
for indexing strings, and for ordinal trees. On the other hand sparse sets are used in many succinct 
data structures in order to compress pointers to blocks, each of which stores a part of the data. 
Because in the word RAM model any consecutive 0(log n) bits of data are accessed in constant time, 
we often divide the data into blocks of 0(logn) bits each. For example, an ordinal tree with n nodes 
is encoded in a bit-vector of length 2n, and to support tree navigating operations, the bit-vector 
is divided into block of length ^ Ig n bits and in each block we logically mark one bit to construct 
a contracted tree with 0(n/logn) nodes. These logical marks are represented by a bit- vector of 
length 2n in which An/lgn bits are one. The ratio of one is 2/ Ig n, that is, the vector is sparse. Such 
vectors can be encoded in B{2n, 4n/ log n) + 0{n log log n/ log n) = 0(n log log n/ log n) = o(n) bits. 
Therefore for storing a sparse vector in a compressed form is important for succinct data structures. 

In this paper we will mainly focus on sparse sets to support rank and select functions. Al- 
though in some applications like wavelet trees we also need a selectp function^, we usually assume 
dense sets in such applications and well-developed Rank/Select dictionaries for dense sets can be 
applied. 

2.1 Previous Implementation of Rank/Select Dictionaries 

We first give a brief description of Rank/Select dictionary using n + o(n) bits, which is called 
verbative. We conceptually partition B into subsequences of length / := log^ n each, called 
large block. Then each large block is partitioned into subsequences of length s := logn/2 each, 
called small block. For the boundaries of large blocks we store rank-directory (results of rank) in 
Rl[0 . . . n/l] explicitly using 0(n/ log^ n ■ log n) = 0(n/ log n) bits. We also store rank-directory for 
each boundary of small blocks in i?s[0 ... n/s], but here we store only relative values to ones stored 
for the large blocks, which are stored in 0(nloglogn/logn) bits. 

Then rank is computed by rank(x,5) = i?/[[x//J] -|- i?s [ [x/sj ] -\- popcount{[x / s\ ■ s,x mod s), 
where popcount{i, j) is the number of ones between B[i . . . i+j] which can be calculated in constant 
time using a pre-computed table of size 0(-y/nlog^ n) bits or the popcount function jSj^- For select 
we have two options; the first is a constant time solution using o(n) auxiliary data structures ^1] 
and the second is a O(logn) solution which is a binary search using rank functions without any 
auxiliary data structures |Hj . Because of the luck of space we omit the detail of select in constant 
time jl4j . 

We next introduce Rank/Select dictionary using uHq^S) + o(n) bits, which is called ent. The 
main difference between verbative and ent is the representation of bit-vector itself, that is each 
small block is encoded by the enumerative code as follows. Given t, the length of the block, and 
u, the number of ones in the block, we calculate X]j=i u {t~p -i) 'wli^re pu is the position of i-th 
one in the block. This value is the unique number in [0, [Ig (^)] — 1] for each possible block of t 
length with u ones. This number can be represented by B{t,u) = [Ig (^)] bits and the size of all 
encoded blocks is less than B{n,m) < uHqIS) JHI- We represent each small block as the result of 
enumerative code, and the total size is less than nHQ{S). Since they have different sizes, we also 
need to store pointers to compressed small blocks, which is 0(nloglogn/logn) = o(n) bits. These 
encoding and decoding are performed by using pre-computed table of 0{y/n\og^ n)-bits. 

We note that although the size of ent is nHo(S) + o(n) bits, we cannot ignore the o(n) term 

^We do not discuss ranko since it can be computed by rank as ranko(i, S) — i + 1 — rank(i, S). 
^In this paper let a mod b denote a — [a/b\ . 
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because nHo{S) term is small compared to n if m ^ n and o(n) is as much as Q{nHo(S)). 

3 Estimating Pointer Information 

We first propose esp (stands for Estimating Pointer information), which does not require pointer 
information by estimating them from rank information. Although the size of pointer information 
is 0(n log log n/ log n) = o(n), this size is actually large as much as Q{nHQ{S)) terms for real-size 
data. 

First we show the propositions which are needed to bound the size of compressed bit vector in 
terms of rank information. Given a bit- vector B[0 . . . n — 1] with m ones, let L{B) be the length 
of code word for B using enumerative code [2] (See Section f2.1() . Then, 

Proposition 1 L{B) < Hq{B). 

Because Hq{B) is the size of a representation of block that uses lg(n/m) bits for each I's and 
lg(n/(?i — m)) bits for each O's, and the L{B) = B{n,m) := [Ig (^)] is the smallest length of the 
code to represent the bit vector. 

Let Bi (i = 1 . . . [^]) be the partition of B, and u be the size of each block. Then, 

r^i r-i 

Proposition 2 ^L{Bi) <'^uHo{Bi) < nHo{B). 

1=1 i=l 

The second inequality holds because nHQ{B) is the concave function. 

Let B' := B[0...t] {t < n) be the prefix of bit-vector B. Since L{B') < Hq{B') (use Prop.© 
and Prop.©), we can store all code words of B' within Hq{B') bits. However since the inequality 
not equality holds we still have an estimation error of pointers. We therefore need to insert gap 
bits so that we always estimate the correct pointer information. 

We will explain the details of esp. Basically, esp is based on ent except the existence of super- 
large blocks (SLB) since we need to reset estimation errors in each SLB. We conceptually partition 
B into subsequences of length k := log^re each, called super large block (SLB). Then each SLB 
is partitioned into large block (LB) of length / := log^n. Then each LB is partitioned again into 
small block (SB) of length s := logn/2. We then encode each SB by enumerative code (Section 
12.11) independently. The code word for i-th SB: SBi is stored in the position which is determined 
as follows. Let Ir and Sr be results of rank for LB and SB as Ir = Ri[xi], and Sr = Rs[xs] where 
Xl = [x/ 1\ and Xs = [x/s\. Then we estimate the starting positions of LB and SB as 

lp = Ho{LB',^) = lr-lg^-^ + {l-Xi-lr)-\gj^^^^ (1) 
Sp = H,{SB'J = Sr-\g'-^ + {s-Xs-Sr)-\g ' ' . (2) 

where LB'^^ denotes the preceding LBs from the boundary of SLB up to LBi and SB[ denotes 
the preceding SBs from the boundary of LB up to SBi. Then the position for compressed SBi is 
sip + Ip + sp where sip is the pointer information of SLB which is stored explicitly. We note that 
all code words are not overlapped (use Prop.(j2}) and gap-bits are automatically inserted. 

We store rank-directory for LB, SB and pointer information for SLB. All of them are stored 
in o(n) bits. 

For rank(x,5), we lookup correspondent rank-directory for LB, SB as Ir = Ri[xi], and Sr = 
Rs[xs] where xi = [x/l\ and Xs = [x/s\. Then we estimate the pointer information for LB and SB 
as in and We then read the compressed bit representation of SB from that position and 
decode it in constant time and do popcount as in verbative. 
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For select, we use the same approach as in ^Ij which is done in constant time with the o(n)-bits 
auxihary data structures. 

In practice, since it is very slow to compute the logarithm of a floating-point number for the 
estimating the entropy, we use a pre-computed table lookup and also use fixed-point integer repre- 
sentation. We require two integer multipliers and one integer addition for estimating one value of 
the entropy. 



4 RecRank 

The second data structure recrank uses the reduction of a sparse bit-array into a contracted bit- 
array and a denser extracted bit-array which was originally used for Algorithm I in ^1]. Here we 
use the reduction recursively. 

Given a bit-arrays i?[0...n — 1] with m ones, we conceptually partition B into the blocks 
Bq, . . . , Bn/t of length t. We call zero block (ZB) a block where all elements are and non-zero 
block (NZ) a block where there is at least one 1. The contracted bit-array of B, i?c[0, • • • -.n/t — 1] 
is defined as a bit-string such that Bc[i\ = if i?i is ZB, and Bc[i\ = 1 if -Bj is NZ, and the extracted 
bit-array B^, is defined as a bit-array which is formed by concatenating all NZ blocks of B in order. 

We can calculate rank of B using B^. and B^ as 

rank(x,5) = rank(rank([x/tj , 5c) • t + (x mod t) • Sc[[x/tJ], 5e). (3) 

We then recursively apply this reduction by considering the extracted bit array as a new input bit 
array. We continue this process until the extracted bit-array is dense enough (the probability of one 
in a bit-array is larger than 1/4). After u times of the reduction, we have t contracted bit arrays 
B^, B^, B^ and the final extracted bit array B^. 

Here we take the strategy that contracted bit arrays would be dense (the probability of ones in 
the bit array would be 1/2). Let p{B) = m/n be the probability of ones in the bit array B. We 
choose the block size t = _ ig(\_p) so that the p{Bc) would be 0.5. This is because the probability 
of t bits being all zero is (1 — p)* and the half of the elements in contracted bit array is one when 
(1 — pY = 1/2. Then the length of Be is — nlg(l — p) and the length of B^ is n/2. We note that B, 
contains m ones and p{Be) = 2p. This reduction is applied u = —Igp times so that the probability 
of ones in the final extracted bit array is larger than 1/4. 

Let T be the size of recrank and p = 2^". We can calculate T as follows. 



e 



n ■ 



i=0...u-2 



< n 



1 

{—mlgp — 2m/3 — 2mp/3) + 2m (6) 



loge(2) 



In ©, we use lg(l — x) < x + for < x < |. In short, T is bounded by 1.44m Ign/m + m bits. 

For rank, we apply © at each stage. Since the number of reduction is — Igp = \gn/m and 
each stage is done in constant time, the total time is 0(log n/m). For select, we apply select in 
each stage, each of which is done in constant time ^1], so the total time is in 0(logn/m). 
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int select_vc (int i){ // return select(i,S) 

int b = i/p; int q = iXp; // b is the block number and q is the offset 
int X = S[b] + q; 

for (int j = 0; j < T [b] ; j++) // count the number of ones in first q bits in each digit 

X += popcount [V[b] [j] & ((lU « q) - 1)] « j; 
return x; 

} 

Figure 1: An example code of select in vcode written in C++. V^lpJb] contains Vp[j] and 
popcount[k] returns the number of set bit in binary sequence of k. Other variables correspond 
to the definition in the paper 

5 Vertical Code 

A Vertical Code (vcode) supports fast select and small space-size in practice because of its byte- 
based operations and a novel orientation of data. This is a kind of opportunistic data structure, 
that is, although it is not entropy-compressed Rank/Select dictionary in the worst case, in most 
case its size is close to the zero-th order empirical entropy. 

Given a bit-arrays B[0 . . . n — 1] with m ones, we first convert it into the gap sequence d[0 . . .m — 
1], d[{\ = select(5,i + 1) -select{B,i) - 1, {d[0] = select(fi, 1)), {i = . . . m - 1). 

We then partition d into blocks Bi, . . . , B^jp of size p = 0(log^ n). Let r[0 . . . m/p — 1] be the 
arrays such that T[i] = lg[maxj=o...p-i + i]J > ^ib] be the bit arrays of length p consisting of 
the set of the j-th bit of d in the block Bi, and ^[O ...m/p—l] be the arrays such that S[i] = d[ip\. 
We note that all d in a block Bi can be represented in T[i] bits each. 

We describe how to get select(5, z) by using T, V and S. Let b = i/p and q = i mod p. 
Since select (S',i) = S[b] + q + Yl'i^p d[i], we count the number of ones in the first q bits of each 
Vfo [0] , . . . , V;, [Tj] , then sums them up with shift. Figure El shows the example code of select in 
vcode. 

The characteristic of vcode is if we set p is a multiple of eight, all operations are byte-aligned. 
And the cost of X]i=tp which would be small if T[b] is small. This idea is similar to 

gap-based compressed dictionary ^^E]- We encode gap information directly and we can expect 
the time of select is small if gap is small. For example, the gap of ip in compressed suffix arrays |24| 
is very small. 

For select, we need to do T[i] operations each of which is done in constant time. Since T[i] would 
become O(logn) in the worst case, the total time for select is O(logn). For rank and selectg, we 
need to do the binary search from m elements using select which is done in 0(logn • logm) time 
in the worst case. 

The size of S is 0(logn • m/log^n) = o(n). Since d[i] < n, the size of Tj is bounded by Ign 
and T is bounded by 0(log n ■ m/ log^ n) = o(n) and the size of V is bounded by m\gn/ Ig^ n bits, 
which happens when d[ip] = n/lg^n (0 < i < n/p) and others d[i] are all 0. we note that we can 
expect the size of V is close to mlgn/m (~ nHQ{B)) bits and the time of select is close to 0(1) 
when adjacent elements in d have similar values. 

6 SDarrays 

The idea of SDarrays (sdarray) is to use two different techniques for sparse sets and dense sets 
each, which enables us to design the data structure simply. We call the former sarray and the 
latter darray (sarray uses darray as a part of data structure). 
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int rank_sarray (int i){ // return rank(i,B) in sarray 
int y = select_0(i/2~w,H)+l ; int x = y-i/2~w; 
for (int j = i7,2~w; H [y] == 1; x++,y++){ 
if (L[x] >= j){ //L is lower-bit of B 
if (L[x] = j) X++; 
break; 

} 

} 

return x; 

} 

Figure 2: An example code of rank in sarray. Variables correspond to the definition in the paper 

First we will introduce sarray for sparse sets. Given a bit-arrays B[0 . . . n — 1] with m ones 
(m <^ n), we define x[0 . . .m — 1] such that x[i] = select + 1,B). Each x is then divided into 
upper z = [IgmJ bits and lower w = [Ig n/m] bits. Lower bits are stored explicitly in L[0 . . . m — 1] 
using m ■ [Ig n/m] bits. Upper bits are represented by a bit array H[0 . . .2m — 1] such that 
H[xi/2'^ + i] = 1 and others are 0. By using H and L, we can calculate select in sarray by 
select(i,i?) = (select (i, ff) — i) ■ 2^ + L[i]. We need select for H. Here we can assume that H is 
dense because there are m ones and m zeros in H. 

We then explain darray for dense sets, B[0 . . . n — 1] with m ~ n/2 ones^. We first partition H 
into the blocks such that each block contains L ones respectively. Let Pi[0 . . . n/L — 1] he the bit 
arrays such that Pi[i] is the position of {iL + l)-th one. We classify these blocks into two groups. 
If the length of block size {Pi[i] — Pi[i — 1]) is larger than L2, we store all the positions of ones 
explicitly in Si . If the length of block size is smaller than L2 , we store the each L^-th positions of 
ones in Sg. We can store these values in lg-L2 bits. 

For select (i, B) in darray, we lookup P;[[z/L]] and see whether the block is larger than L2 or 
not. If it is, we lookup the value in Si which is stored explicitly. If not, we lookup correspondent 
Ls-th value in Sg and then do sequential search in the block which would take 0(1/2/ log n) time 
because we can read O(logn) bits in RAM model. We note that if we can assume that ones is 
distributed in B uniformly, this sequential search is done in 0(1) time. Although this data structure 
concerns only select, we can use same data for selectp by reversing bits in H at reading time. 

For rank in darray, we use the same method as in verbative. For rank(f , B) in sarray(see 
the example code in figure E}, we first calculate y = selecto(i/2"', i/) + 1 to find the smallest 
element which is greater than [z/2'^] • 2^ . Then we count the number of elements which equals to 
or smaller than i by sequentially searching over H and L in time 0(n/m) because the possible bit 
pattern of length Ig n/m is n/m. If we use binary search, we can do it in 0(log n/m) time but this 
is slower than sequential search in practice and we use sequential search. 

The size of Pi is 0(^ • logn), that of Si is at most • Llgn bits, and that of Ss is at most 
-^lgL2 bits. When we choose L := O(log^n), L2 := O(log^n), and L3 := O(lgn), all the sizes of 
Pi and Si and Sg are o(n). In summary, the size of darray is n + o(n) bits. 

We then analyze the size of sarray. We use m • [Ig n/m] bits for L. For H of length 2m, we 
use the data structure of darray, which is m + o(m) bits. Therefore the total size of sarray is 
m • [Ig n/m] + 2m + o(m) bits. 

^The size of H in sarray is 2m not n. Here we explain darray in general case. 
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Size of the data structures 




Figure 3: Size of the data structures. 
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Figure 4: Time for 100,000,000 random rank operations. 



7 Experimental Results 

We conducted experiments using esp (esp), recrank (rr), vcode (vc), sarray (sa) and dar- 
ray (da). We also compare them with byte-based implementation in ^1] (Kim), and its re- 
implementation by us (Kim2) and "S* (navarro). For esp, we used k = 2^^, / = 2^, s = 2^. 
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Figure 5: Time for 100,000,000 random select operations. 



Table 2: The space results for esp, recrank, vcode, sarray and Navarro for the bit arrays of n-bit 
length with 1% and 5% ones. The values is the percentage of the size of each data structures over 
an original bit-array. 



Ratio of I's 


esp 


recrank 


vcode 


sarray 


nHo 


Navarro 


1% 


17.02 


15.83 


15.05 


10.13 


8.08 


137.5 


5% 


42.67 


49.32 


62.25 


40.59 


28.64 


137.5 



For vc, we used p = 8. For sa, we used L = 2^^ L2 = 2^^ and L3 = 2^. 

For select in rr, we used O(logn) solutions because o(n) auxiliary data would become large. 
For rank and select in sa and da, we used sequential search in H because it is faster in practice. 

We used GNC C 3.4.3 -06 -m64. We measured time using ftime functions on the 3.4GHz Xeon 
with 8GB main memory. 

All experiments are done using the bit arrays of length 10M(10 • 2^^^) bits. 

Figure El shows the result of the size of several data structure. We also show the result of 
ni?o(-B) which is the lower bound of data structure if we only know the ratio of ones. From here 
we can see that the size of esp is very close to nHQ{B) in all conditions. We also find that the size 
of rec, sa and vc are very close to nHQ{B) when the ratio of 1 is very small. 

Table 21 shows the sizes of each data structures for the bit-arrays with 1% and 5% ones. We 
find that the sizes of proposed data structures are indeed close to uHq. We note that sarray is the 
smallest in both case. 

Figure is the result of 10^ rank operations. We can see that Kim2, Navarro and da is the 
fastest which is the same as in rank in verbative. On the other hand vc is the slowest for rank 
because it needs binary search using select functions. Only rec is slower in the small ratio of 1 
because its computation cost is 0(log n/m) depending on the inverse number of m. 

Figure 121 is the result of 10^ select operations. Among several methods, sa is the fastest in all 
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conditions. As in the result of rank, rec is slower in the small ratio of 1. We also find that da 
shows different behavior in the small ratio of 1 because it switches data structures depends on the 
ratio of 1. We note that the result of esp for rank and select is fast in the ratio of 1 is small or 
large sine esp employ decode table for enumerative code which is only prepared for compressible 
block. Therefore it becomes slower when the block could not be compressible. 

We did not show the results in bit arrays of different length because of the lack of space. We 
note that except Navarro^ rec and vc which use binary search in select, all methods have the 
similar result with bit arrays of different length. 

8 Concluding Remarks 

In this paper, we have proposed novel four Rank/Select dictionaries, esp, recrank, vcode and 
sdarray. Experimental results show that the sizes of these data structures are indeed close to the 
zero-th order empirical entropy and achieves fast queries. 

We also note that they are easy to implement (except esp) because recrank tiscs reduction 
which can employ well-developed dense sets techniques and vcode converts the problem into the 
popcount in bytes and sdarray separates the problem for dense sets and sparse sets, which simplify 
the problem. 

In the next stage of our research, we will extend our result to more complex data structures, 
such as sequences from large alphabets. We also consider applications which employ appropriate 
data structures and also apply them to data compression as well. 
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